\chapter{Conclusion And Future Work}


\section{Conclusion}

In this report, we reviewed techniques for handling speaker variabilities for the two categories of widely used ASR systems, the traditional GMM-HMM based ASR and the hybrid DNN-HMM ASR. Various adaptation
techniques and normalization techniques for these two kinds of ASRs are investigated. Experiments on WSJ have shown that for a DNN-HMM system, using a LHN adaptation with techniques like eigenvoices have shown
considerable improvements. We also highlighted the importance of gaining more insights towards these methods in a common framework. Finally, we listed the research questions which we would like to investigate in future.



\section{Future Work}

To move forward in this research problem, we belive it is necessary formulate a framework to handle the speaker variability in DNN acoustic models. One major limitation in current research in speaker adaptation research 
of DNNs is that the insufficient understanding towards to encoding of speaker variability in the model. Furthermore, different adaptation techniques are applied on different datasets and this makes it harder to have a
comprehensive comparison between these techniques. Therefore, it is worthwhile to apply different adapation techniques to the models trained on the same dataset and investigate the behaviour of speaker variability with 
each technique. The sensitivity of hidden units for speaker variability will be used as the measure.

In the near future our research will be focused on answering following research questions.

\begin{enumerate}
 \item What is the trend of speaker variability with each hidden layer?
 \item  What portions (layers) would be good candidates to perform speaker adapation?
 \begin{itemize}
  \item Should we retrain the entire network?
  \item As a Linear Input Networks (LIN) ?
  \item As a Linear Hidden Network (LHN) between lower hidden layers?
  \item As a Linear Hidden Network (LHN) between top hidden layers?
  \item As a Linear Output Network (LON) ?
 \end{itemize}
    
  \item What is really happening interms of speaker sensitivity during the adaptation?
  \item What is happening when input features are augmented using speaker representations? 
  \begin{itemize}
   \item Does the model learn to classify acoustic units independent of the speakers?
   \item The model encode the speaker information as well as acoustic information?
  \end{itemize}
  
  \item Is it possible to find a structure for the parameters of a DNN acoustic model?
  \begin{itemize}
   \item Can we utilize the sparseness of DNN parameters in finding the structure?
   \item Can we used that structure to develop more advanced adapation techniques?
  \end{itemize}
  
\end{enumerate}



